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Abstract: 

Locating and retrieving specific data from the World Wide Web (WWW) 
is an important problem. Existing search engines often return too much 
useless data and are generally incapable of automatically extracting 
specific information such as names and email addresses. We describe 
WIRE, a WWW-based information retrieval and extraction system whose 
goal is to accurately retrieve and organize specific information from the 
World Wide Web. WIRE employs several innovative techniques. First, 
queries of WIRE are tree structured. This not only provides an order in 
which Web pages are to be searched/retrieved but also provides a context 
for more accurate retrieval. Second, WIRE employs a library of search 
templates based on the structure of HTML files to extract specific 
information. These templates can be complemented by user-provided 
search examples and patterns for better results. Third, WIRE has a filter 
mechanism to filter our undesired information to further improve retrieval 
accuracy. 
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Abstract: 

Similarity searching is an important tool to many biological scientists. 
Various computer implementations (BLAST, FASTA, Smith- Waterman) 
are used by scientists to analyze their sequences of interest to identify 
identities (perfect matches) or similarities (statistically significant 
matches) between their query sequences and large databases such as 
GenBank. Search engines currently return brief annotations and 
alignments ranked in order of statistical significance or raw similarity 
score. However, it is frequently not the top-scoring similarities that bring 
important new information to the investigating scientist, but the content 
of the annotation or similarity "hits" at any significant score. The Gene 
Alert algorithm applies additional filtering and a user weighted keyword 
search to the BLAST output to parse the output into a form customized 
to the user. There are three components to the Gene Alert 
implementation as it is currently operating: an organized file structure, a 
BLAST engine, and a parser written in the PERL scripting language. The 
file structure was designed to place code and database components in 
logical positions and to facilitate future complete automation of the Gene 
Alert and similarity search system. Shown here is the file structure within 
the UNIX environment. 
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Abstract: 

With the exponentially growing amount of information available on the 
Internet, the task of retrieving documents of interest has become 
increasingly difficult. Search engines usually return more than 1,500 
results per query, yet out of the top twenty results, only one half turn out 
to be relevant to the user. One reason for this is that Web queries are in 
general very short and give an incomplete specification of individual 
users' information needs. This paper explores ways of incorporating users' 
interests into the search process to improve the results. The user profiles 
are structured as a concept hierarchy of 4,400 nodes. These are populated 
by 'watching over a user's shoulder' while he is surfing. No explicit 
feedback is necessary. The profiles are shown to converge and to reflect 
the actual interests quite well. One possible deployment of the profiles is 
investigated: re-ranking and filtering search results. Increases in 
performance are moderate but noticeable and show that fully automatic 
creation of large hierarchical user profiles is possible. 
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